CDF confidence bands - DKW inequality

2022-03-08 · 2 min read

$$ \renewcommand{\Pr}[1]{\text{Pr}\left[ #1 \right]} \def\eps{\varepsilon} $$

Overview #

Let's say I have a well known distribution defined by a CDF $F(x)$. I also sample an empirical CDF $F_n(x)$ from a population. Given some confidence parameter, the DKW Inequality bounds how close a random ECDF $F_n(x)$ will be to the true CDF from which the samples are drawn.

Inequality #

The inequality itself states the probability that the Kolmogorov-Smirnov Statistic violates some bound (the K-S statistic is just $||F_n(x) - F(x)||_{\text{max}}$)

$$ \Pr{\sup_{x} ; \Bigl|F_n(x) - F(x)\Bigr| > \eps} \le 2 e^{-2 n \eps^2} \qquad \text{for every } \eps > 0. $$

Note: there is an extended inequality for a multivariate $F$. The bound changes from $2 e^{(\cdots)}$ to $2k , e^{(\cdots)}$, where $k$ is the vector dimension.

CDF Confidence Bands #

We can use the DKW Inequality to produce a "band" around our true CDF that must contain our empirical CDF with some confidence probability.

With a failure probability $\alpha$, our sampled ECDF will sit outside the interval

$$ \Bigl[ F(x) - \eps,; F(x) + \eps \Bigr] \qquad \text{where } \eps = \sqrt{\frac{\ln \frac{2}{\alpha}}{2n}} $$

Pros #

  1. allows us to contain the entire CDF with a single band value.
  2. non-parametric - requires no assumptions about the sample distribution.

Cons #

  1. provides weak guarantees at the tails, tighter bounds around the median
  2. pointwise approaches (what?) can provide tighter bounds at each individual value
  3. non-parametric - provides weaker bounds than others with more assumptions.

Example Bounds #

$n$$\alpha$$\eps$
100.10.38702275602049496
100.010.5146997846583985
100.0010.6164779987778186
1000.10.12238734153404082
1000.010.16276236307187292
1000.0010.19494746035204052
10000.10.038702275602049495
10000.010.05146997846583985
10000.0010.06164779987778186
100000.10.012238734153404082
100000.010.016276236307187292
100000.0010.019494746035204052
200000.10.008654091913011426
200000.010.011509037065006824
200000.0010.013784867119002347
  1. DKW Inequality (wikiwand)
  2. ENS PSL Fact Sheet 1: DKW Inequality
  3. On the tight constant in the multivariate DKW Inequality
  4. Kolmogorov-Smirnov Statistic (wikiwand)